Distributional Reinforcement Learning via Moment Matching
نویسندگان
چکیده
We consider the problem of learning a set probability distributions from empirical Bellman dynamics in distributional reinforcement (RL), class state-of-the-art methods that estimate distribution, as opposed to only expectation, total return. formulate method learns finite statistics each return distribution via neural networks, RL literature. Existing however constrain learned predefined functional forms which is both restrictive representation and difficult maintaining statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, by leveraging technique hypothesis testing known maximum mean discrepancy (MMD), leads simpler objective amenable backpropagation. Our can be interpreted implicitly matching all orders moments between its target. establish sufficient conditions for contraction operator provide finite-sample analysis samples approximation. Experiments on suite Atari games show our outperforms standard baselines sets new record non-distributed agents.
منابع مشابه
A Distributional Perspective on Reinforcement Learning
In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always be...
متن کاملA Distributional Perspective on Reinforcement Learning
To the best of our knowledge, the work closest to ours are two papers (Morimura et al., 2010b;a) studying the distributional Bellman equation from the perspective of its cumulative distribution functions. The authors propose both parametric and nonparametric solutions to learn distributions for risk-sensitive reinforcement learning. They also provide some theoretical analysis for the policy eva...
متن کاملDistributional Reinforcement Learning with Quantile Regression
In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build ...
متن کاملReinforcement Learning by Probability Matching
We present a new algorithm for associative reinforcement learning. The algorithm is based upon the idea of matching a network's output probability with a probability distribution derived from the environment's reward signal. This Probability Matching algorithm is shown to perform faster and be less susceptible to local minima than previously existing algorithms. We use Probability Matching to t...
متن کاملRL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a “fast” reinforcement learning algorithm, we p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i10.17104